Acoustic Scene Classification using Time-Delay Neural Networks and Amplitude Modulation Filter Bank Features
نویسندگان
چکیده
This paper presents a system for acoustic scene classification (SC) that is applied to data of the SC task of the DCASE’16 challenge (Task 1). The proposed method is based on extracting acoustic features that employ a relatively long temporal context, i.e., amplitude modulation filer bank (AMFB) features, prior to detection of acoustic scenes using a neural network (NN) based classification approach. Recurrent neural networks (RNN) are well suited to model long-term acoustic dependencies that are known to encode important information for SC tasks. However, RNNs require a relatively large amount of training data in comparison to feed-forward deep neural networks (DNNs). Hence, the time-delay neural network (TDNN) approach is used in the present work that enables analysis of long contextual information similar to RNNs but with training efforts comparable to conventional DNNs. The proposed SC system attains a recognition accuracy of 76.5 %, which is 4.0 % higher compared to the DCASE’16 baseline system.
منابع مشابه
A New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks
Introduction: Recent studies have acknowledged the potential of convolutional neural networks (CNNs) in distinguishing healthy and morbid samples by using heart sound analyses. Unfortunately the performance of CNNs is highly dependent on the filtering procedure which is applied to signal in their convolutional layer. The present study aimed to address this problem by a...
متن کاملImproved Automatic Speech Recognition Using Subband Temporal Envelope Features and Time-Delay Neural Network Denoising Autoencoder
This paper investigates the use of perceptually-motivated subband temporal envelope (STE) features and time-delay neural network (TDNN) denoising autoencoder (DAE) to improve deep neural network (DNN)-based automatic speech recognition (ASR). STEs are estimated by full-wave rectification and low-pass filtering of band-passed speech using a Gammatone filter-bank. TDNNs are used either as DAE or ...
متن کاملOn the use of Textural Features and Neural Networks for Leaf Recognition
for recognizing various types of plants, so automatic image recognition algorithms can extract to classify plant species and apply these features. Fast and accurate recognition of plants can have a significant impact on biodiversity management and increasing the effectiveness of the studies in this regard. These automatic methods have involved the development of recognition techniques and digi...
متن کاملDeep Sequential Image Features for Acoustic Scene Classification
For the Acoustic Scene Classification task of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2017), we propose a novel method to classify 15 different acoustic scenes using deep sequential learning, based on features extracted from Short-Time Fourier Transform and scalogram of the audio scenes using Convolutional Neural Networks. It is the first time...
متن کاملInvestigating modulation spectrogram features for deep neural network-based automatic speech recognition
Deep neural network (DNN) based acoustic modelling has been shown to yield significant improvements over Gaussian Mixture Models (GMM) for a variety of automatic speech recognition (ASR) tasks. In addition, it is also becoming popular to use rich speech representations, such as full-resolution spectrograms and perceptually motivated features, as input to the DNNs as they are less sensitive to t...
متن کامل